MOODS: fast search for position weight matrix matches in DNA sequences
نویسندگان
چکیده
UNLABELLED MOODS (MOtif Occurrence Detection Suite) is a software package for matching position weight matrices against DNA sequences. MOODS implements state-of-the-art online matching algorithms, achieving considerably faster scanning speed than with a simple brute-force search. MOODS is written in C++, with bindings for the popular BioPerl and Biopython toolkits. It can easily be adapted for different purposes and integrated into existing workflows. It can also be used as a C++ library. AVAILABILITY The package with documentation and examples of usage is available at http://www.cs.helsinki.fi/group/pssmfind. The source code is also available under the terms of a GNU General Public License (GPL).
منابع مشابه
On counting position weight matrix matches in a sequence, with application to discriminative motif finding
MOTIVATION AND RESULTS The position weight matrix (PWM) is a popular method to model transcription factor binding sites. A fundamental problem in cis-regulatory analysis is to "count" the occurrences of a PWM in a DNA sequence. We propose a novel probabilistic score to solve this problem of counting PWM occurrences. The proposed score has two important properties: (1) It gives appropriate weigh...
متن کاملPWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.
Summary Transcription factors (TFs) regulate gene expression by binding to specific short DNA sequences of 5 to 20-bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or TF binding site model from a public database. Availability The web server and...
متن کاملThe statistical significance of nucleotide position-weight matrix matches
MOTIVATION To improve the detection of nucleotide sequence signals (e.g. promoter elements) by position-weight matrices (PWM) using the concept of statistically significant matches. RESULTS The Mksite program was originally developed for analyzing protein sequences. We report NMksite, a new version adapted to the processing of nucleotide sequences. NMksite creates PWM from nucleotide sequence...
متن کاملFast Search Algorithms for Position Specific Scoring Matrices
Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can b...
متن کاملMembers of the piggyBac transposon family in Aedes Aegypti
piggyBac is a Class-II transposable element with 13-bp inverted terminal repeats. It was originally isolated from the cabbage looper moth, Trichoplusia ni. In this article, we analyze the Aedes aegypti genome to discover possible members of the piggyBac family of transposons using a combination of techniques. Specifically, we construct a set of Hidden Markov Models based upon multiple sequence ...
متن کامل